CreditCard Users Churn Prediction

Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged on every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers’ and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help bank improve their services so that customers do not renounce their credit cards

Objective

Data Dictionary:

Load and view the dataset

Check the unique values of each columns

Summary of the data

Check count of each unique category in variables

EDA

Univariate

Bivariate Analysis

Data Preparation

-Convert Target variable, Attrition_Flag. if the account is closed then 1 else 0

- Convert Gender, Card_Category, Attrition_Flag to Categorical.

- Dropping columns (Avg_Open_To_Buy, Total_Trans_Ct, and Customer_Age)

- Convert Unknow to None, to impute the null value with KNN

Split Data

- Treat missing values

Encoding Categorical variables

Model Building

Model can make wrong predictions as:

  1. Predicting a customer will attrite or churn the credit card and customer does not churn - Loss of resources
  2. Predicting a customer won't churn the credit card and customer does churn - Loss customer

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Logistic Regression

Oversampling train data using SMOTE

LogisticRegression on Oversampling data

Undersampling train data using RandomUnderSamper

Logistic Regress on undersampling data

Model building - Bagging and Boosting

Decision Tree

RandomForest Classifier

Bagging Classifier

Model building - Boosting

Adaboost Classifier

Gradient Boosting Classifier

XGBoosting Classifier

  1. By focusing on the Recall, we have Bagging, AdaBoost, and Gradient models are the best 3 models. Becasue they are not overfitting model and Recall score best among others.

AdaBoost - GridSearchCV

Gradient Boost - GridSearchCV

Bagging - GridSearchCV

  1. By focusing on the Recall, we have Bagging, AdaBoost, and Gradient models are the best 3 models. Becasue they are not overfitting model and Recall score best among others.

AdaBoost - RandomizedSearchCV

GradientBoosting - RandomizedSearch CV

Bagging - RandomizedSearch CV

Model Performances

Comparing all models

Actionable Insights & Recommendations

Business recommendations